LangSmithLoader

This notebook provides a quick overview for getting started with the LangSmithLoader. For detailed documentation of all LangSmithLoader features and configurations head to the API reference.

Overview

Integration details

Class	Package	Local	Serializable	PY support
LangSmithLoader	@langchain/community	✅	beta	✅

Loader features

Source	Web Loader	Node Envs Only
LangSmithLoader	✅	❌

FireCrawl crawls and convert any website into LLM-ready data. It crawls all accessible sub-pages and give you clean markdown and metadata for each. No sitemap required.

FireCrawl handles complex tasks such as reverse proxies, caching, rate limits, and content blocked by JavaScript. Built by the mendable.ai team.

This guide shows how to scrap and crawl entire websites and load them using the LangSmithLoader in LangChain.

Setup

To access the LangSmith document loader you’ll need to install @langchain/core, create a LangSmith account and get an API key.

Credentials

Sign up at https://langsmith.com and generate an API key. Once you’ve done this set the LANGSMITH_API_KEY environment variable:

export LANGSMITH_API_KEY="your-api-key"

Installation

The LangSmithLoader integration lives in the @langchain/core package:

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/core

yarn add @langchain/core

pnpm add @langchain/core

Create example dataset

For this example, we’ll create a new dataset which we’ll use in our document loader.

import { Client as LangSmithClient } from 'langsmith';
import { faker } from "@faker-js/faker";

const lsClient = new LangSmithClient();

const datasetName = "LangSmith Few Shot Datasets Notebook";

const exampleInputs = Array.from({ length: 10 }, (_, i) => ({
  input: faker.lorem.paragraph(),
}));
const exampleOutputs = Array.from({ length: 10 }, (_, i) => ({
  output: faker.lorem.sentence(),
}));
const exampleMetadata = Array.from({ length: 10 }, (_, i) => ({
  companyCatchPhrase: faker.company.catchPhrase(),
}));

await lsClient.deleteDataset({
  datasetName,
})

const dataset = await lsClient.createDataset(datasetName);

const examples = await lsClient.createExamples({
  inputs: exampleInputs,
  outputs: exampleOutputs,
  metadata: exampleMetadata,
  datasetId: dataset.id,
});

import { LangSmithLoader } from "@langchain/core/document_loaders/langsmith"

const loader = new LangSmithLoader({
  datasetName: "LangSmith Few Shot Datasets Notebook",
  // Instead of a datasetName, you can alternatively provide a datasetId
  // datasetId: dataset.id,
  contentKey: "input",
  limit: 5,
  // formatContent: (content) => content,
  // ... other options
})

Load

const docs = await loader.load()
docs[0]

{
  pageContent: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.',
  metadata: {
    id: 'f1a04800-6f7a-4232-9743-fb5d9029bf1f',
    created_at: '2024-08-20T17:01:38.984045+00:00',
    modified_at: '2024-08-20T17:01:38.984045+00:00',
    name: '#f1a0 @ LangSmith Few Shot Datasets Notebook',
    dataset_id: '9ccd66e6-e506-478c-9095-3d9e27575a89',
    source_run_id: null,
    metadata: {
      dataset_split: [Array],
      companyCatchPhrase: 'Integrated solution-oriented secured line'
    },
    inputs: {
      input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'
    },
    outputs: {
      output: 'Excepturi adeptio spectaculum bis volaticus accusamus.'
    }
  }
}

console.log(docs[0].metadata)

{
  id: 'f1a04800-6f7a-4232-9743-fb5d9029bf1f',
  created_at: '2024-08-20T17:01:38.984045+00:00',
  modified_at: '2024-08-20T17:01:38.984045+00:00',
  name: '#f1a0 @ LangSmith Few Shot Datasets Notebook',
  dataset_id: '9ccd66e6-e506-478c-9095-3d9e27575a89',
  source_run_id: null,
  metadata: {
    dataset_split: [ 'base' ],
    companyCatchPhrase: 'Integrated solution-oriented secured line'
  },
  inputs: {
    input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'
  },
  outputs: { output: 'Excepturi adeptio spectaculum bis volaticus accusamus.' }
}

console.log(docs[0].metadata.inputs)

{
  input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'
}

console.log(docs[0].metadata.outputs)

{ output: 'Excepturi adeptio spectaculum bis volaticus accusamus.' }

console.log(Object.keys(docs[0].metadata))

[
  'id',
  'created_at',
  'modified_at',
  'name',
  'dataset_id',
  'source_run_id',
  'metadata',
  'inputs',
  'outputs'
]

API reference

For detailed documentation of all LangSmithLoader features and configurations head to the API reference

LangSmithLoader

Overview

Integration details

Loader features

Setup

Credentials

Installation

Create example dataset

Load

API reference

Was this page helpful?

You can also leave detailed feedback on GitHub.

Overview​

Integration details​

Loader features​

Setup​

Credentials​

Installation​

Create example dataset​

Load​

API reference​

Was this page helpful?

You can also leave detailed feedback on GitHub.

Overview

Integration details

Loader features

Setup

Credentials

Installation

Create example dataset

Load

API reference